System and methodology providing compiler-assisted refactoring

ABSTRACT

A system providing an improved method for compiler-assisted refactoring of a software application is described. Upon receiving a request for refactoring of a software application (e.g., changing a given symbol of the application), the binary files of the application are parsed to identify those binary files containing references to the given symbol. The source files of the identified binary files are then retrieved and fed into a compiler. The compiler is used to generate a list of all uses of the given symbol in the software application. This list includes not only the text name of the symbol, but also type information and position information regarding its location(s) in the source file. Based upon the list, changes are applied to the software application.

RELATED APPLICATIONS

The present application is related to and claims the benefit of priorityof the following commonly-owned provisional application(s): applicationserial No. 60/376,402, filed Apr. 29, 2002, entitled “System andMethodology Providing Compiler-Assisted Refactoring”, of which thepresent application is a non-provisional application thereof. Thedisclosure of the foregoing application is hereby incorporated byreference in its entirety, including any appendices or attachmentsthereof, for all purposes.

COPYRIGHT NOTICE

A portion of the disclosure of this patent document contains materialthat is subject to copyright protection. The copyright owner has noobjection to the facsimile reproduction by anyone of the patent documentor the patent disclosure as it appears in the Patent and TrademarkOffice patent file or records, but otherwise reserves all copyrightrights whatsoever.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates generally to a system providing methodsfor facilitating development and maintenance of software applications orsystems, with particular emphasis on a compiler-assisted method forrefactoring of software systems.

2. Description of the Background Art

Before a digital computer may accomplish a desired task, it must receivean appropriate set of instructions. Executed by the computer'smicroprocessor, these instructions, collectively referred to as a“computer program,” direct the operation of the computer. Expectedly,the computer must understand the instructions which it receives beforeit may undertake the specified activity.

Owing to their digital nature, computers essentially only understand“machine code,” i.e., the low-level, minute instructions for performingspecific tasks—the sequence of ones and zeros that are interpreted asspecific instructions by the computer's microprocessor. Since machinelanguage or machine code is the only language computers actuallyunderstand, all other programming languages represent ways ofstructuring human language so that humans can get computers to performspecific tasks. While it is possible for humans to compose meaningfulprograms in machine code, practically all software development todayemploys one or more of the available programming languages. The mostwidely used programming languages are the “high-level” languages, suchC, Pascal, or more recently Java. These languages allow data structuresand algorithms to be expressed in a style of writing that is easily readand understood by fellow programmers.

A program called a “compiler” translates these instructions into therequisite machine language. In the context of this translation, theprogram written in the high-level language is called the “source code”or source program. The ultimate output of the compiler is a compiledmodule such as a compiled C “object module,” which includes instructionsfor execution ultimately by a target processor, or a compiled Javaclass, which includes bytecodes for execution ultimately by a Javavirtual machine. A Java compiler generates platform-neutral“bytecodes”—an architecturally neutral, intermediate format designed fordeploying application code efficiently to multiple platforms.

Java bytecodes are designed to be easy to interpret on any machine.Bytecodes are essentially high-level, machine-independent instructionsfor a hypothetical or “virtual” machine that is implemented by the Javainterpreter and runtime system. The virtual machine, which is actually aspecification of an abstract machine for which a Java language compilergenerates bytecode, must be available for the various hardware/softwareplatforms which an application is to run. The Java interpreter executesJava bytecode directly on any machine for which the interpreter andruntime system of Java have been ported. In this manner, the same Javalanguage bytecode runs on any platform supported by Java.

Conventionally, creation of a software program or system includescreation of individual source code modules. This approach simplifiesprogram development by dividing functionality available in the programinto separate source modules. When multiple source modules are employedfor creating a program, interdependencies between the individual modulesoften exist. Program logic in one module can, for instance, referencevariables, methods, objects, and symbols imported from another module.By the very same token, that module can also export its own methods,objects, and symbols, making them available for use by other modules.

“Visual” development environments, such as Borland's JBuilder™, are thepreferred application development environments for quickly creatingproduction applications. Such environments are characterized by anintegrated development environment (IDE) providing a form painter, aproperty getter/setter manager (“inspector”), a project manager, a toolpalette (with objects which the user can drag and drop on forms), aneditor, and a compiler. In general operation, the user “paints” objectson one or more forms, using the form painter. Attributes and propertiesof the objects on the forms can be modified using the property manageror inspector. In conjunction with this operation, the user attaches orassociates program code with particular objects on screen (e.g., buttonobject); the editor is used to edit program code which has been attachedto particular objects. After the program code has been developed, thecompiler is used to generate binary code (e.g., Java bytecode) forexecution on a machine (e.g., a Java virtual machine).

Although visual development environments enable applications to becreated quickly, problems remain with the development, implementation,and maintenance of production applications. One problem is that when alarge software program or application evolves over time it is commonthat the initial design gets lost as features that were not in theoriginal specification are added to the application. One way of dealingwith this problem of making changes is to design everything with themaximum amount of flexibility. However, this will often lead tounnecessary complexity in the software application, as it is unknownbeforehand which parts of the application will require this additionalflexibility. Irrespective of how well a system is initially designed ordeveloped, the system is typically modified from time to time during itsuseful life to improve performance, to accommodate changing needs, tomake the system easier to maintain, or for various other reasons.However, during the process of adding features not envisioned in theoriginal specification or otherwise making modifications to the system,one must track how particular terms are defined and used by the systemto properly develop the system modifications and to avoid introducingerrors during this development process. Specifically, because ofinterdependencies between modules, when a particular source module ismodified (e.g., edited by a developer), the developer must ensure thatsuch modifications are compatible with the other modules of the program.A particular concern is, therefore, that a given change might “break”the system, because the change is incompatible with other, dependentmodules of the system.

“Refactoring” is a practice of making structured changes to softwareapplications or systems which add the desired flexibility, but keep thefunctionality of the system the same. Refactoring involves taking smallindividual steps that are well defined and that can be applied insuccession to yield a more significant change in the application. Forexample, a developer may wish to perform a “rename refactoring” tochange the name of a particular module (e.g., a class name in a Javaprogram). In order to make this change, the user must locate thedefinition of this class (i.e., the source code for the class) as wellas all uses of the class in other portions of the system. In the case ofa class name in a Java program, the class name is typically used notonly for defining a variable, but also for constructing instances (orobjects) of that class and accessing static members of the class (i.e.,class variables). Another example of refactoring may involve moving aspecified class to a new package (referred to as “move refactoring”).

Refactoring of a system may be small or extensive, but even smallchanges can introduce errors or “bugs” into the system. Accordingly,refactoring must be done correctly and completely in order to beeffective. Good refactoring requires a mechanism for quickly andaccurately identifying definitions and usage of a given symbol in aplurality of source files. The “symbols” that may be involved inrefactoring include, for example, package names, class names,interfaces, methods, fields, variables, and properties. Identificationof definitions and usage of a given symbol enables refactoring to beperformed responsibly and durably so that no bugs are introduced and nobehavior is changed beyond the desired improvements in features,performance, and/or maintainability.

The simplest approach for handling refactoring is to use a textualsearch and replace. However, this approach has the disadvantages ofbeing both slow and inaccurate as refactoring involves more than asimple search and replace task. References must all be accounted for andproperly handled, while patterns must be recognized so that, forinstance, overloaded names are handled correctly. When a renamerefactoring is performed on an overloaded class name, the class's newname must be reflected in the class declaration and in every instance ofthat class and every other reference to that class. However, the newname must only be reflected in the target class, not in the otherclasses that share its original name or their declarations, instances,references, methods, and the like. For instance, a class name may alsobe used as part of a method name in another class. A simple search andreplace cannot be performed as one must understand the context in whicheach instance of the name or symbol is used in various portions of alarge system. All told, a textual search and replace is a veryinefficient tool for handling a complex operation of this nature, as itrequires a user to manually review each usage of the target symbol(e.g., class name) to determine whether or not the symbol should bechanged in that particular instance.

A slightly more elaborate approach involves combining the textual searchwith some language knowledge in the form of a source analysis tool. Thistype of source analysis tool may enable a user to at least narrow downpossible candidates for replacement. Another approach is to use a sourceanalysis tool to build an additional cross-reference index of the usageof symbols in the source code. Unfortunately, building an additionalcross-reference index requires a separate pass to analyze the structureof the source code, before performing the refactoring. In addition, aproblem with both of these approaches is that building this type ofautomated source analysis tool for a particular programming languagelargely involves recreating the compiler for the language in order tounderstand the context in which a particular symbol or token is used ina program. However, recreating the compiler does not take advantage ofthe native compiler that is available for the language. In addition, theprocess of attempting to recreate a compiler creates the potential forintroducing errors as a result of differences between the newly createdcompiler and the native compiler that was used in development andimplementation of the system.

A better approach is sought for refactoring which leverages a provencompiler that is certified for the language and is used in programdevelopment and implementation. The present invention fulfills this andother needs.

GLOSSARY

The following definitions are offered for purposes of illustration, notlimitation, in order to assist with understanding the discussion thatfollows.

Bytecode: A virtual machine executes virtual machine low-level codeinstructions called “bytecodes.” Both the Sun Microsystems Java virtualmachine and the Microsoft. NET virtual machine provide a compiler totransform the respective source program (i.e., a Java program or a C#program, respectively) into virtual machine bytecodes.

Compiler: A compiler is a program which translates source code intobinary code to be executed by a computer. The compiler derives its namefrom the way it works, looking at the entire piece of source code andcollecting and reorganizing the instructions. Thus, a compiler differsfrom an interpreter which analyzes and executes each line of code insuccession, without looking at the entire program. A Java compilertranslates source code written in the Java programming language intobytecode for the Java virtual machine.

Interpreter: An interpreter is a module that alternately decodes andexecutes every statement in some body of code. A Java runtimeinterpreter decodes and executes bytecode for the Java virtual machine.

Java: Java is a general purpose programming language developed by SunMicrosystems. Java is an object-oriented language similar to C++, butsimplified to eliminate language features that cause common programmingerrors. Java source code files (files with a .java extension) arecompiled into a format called bytecode (files with a .class extension),which can then be executed by a Java interpreter. Compiled Java code canrun on most computers because Java interpreters and runtimeenvironments, known as Java virtual machines (VMs), exist for mostoperating systems, including UNIX, the Macintosh OS, and Windows.Bytecode can also be converted directly into machine languageinstructions by a just-in-time (JIT) compiler. Further description ofthe Java Language environment can be found in the technical, trade, andpatent literature; see e.g., Gosling, J. et al., “The Java LanguageEnvironment: A White Paper,” Sun Microsystems Computer Company, October1995, the disclosure of which is hereby incorporated by reference.

Refactoring: Refactoring is the process of making small, structuredchanges to improve the internal structure of an existing software systemwithout changing its observable behavior. For example, if a user wantsto add new functionality to a software system, he or she may decide torefactor the program first to simplify the addition of new functionalityand to make the program easier to maintain over time. A software systemthat undergoes continuous change, such as having new functionality addedto its original design, will eventually become more complex and canbecome disorganized as it grows, losing its original design structure.Refactoring of a software system facilitates building on an existingprogram in a structured manner that avoids introducing new bugs andproblems into the system.

SUMMARY OF THE INVENTION

A system providing an improved method for compiler-assisted refactoringof a software application is described. Upon receiving a request forrefactoring of a software application (i.e., changing a given symbol orelement of the application) from a developer or user, the binary filesof the application are parsed to identify those binary files containingreferences to the given symbol or element. The source files of theidentified binary files are then retrieved and fed into a compiler. Thecompiler is used to generate a list of all uses of the given symbol orelement in the software application. This list includes not only thetext name of the symbol or element, but also type information andposition information regarding its location(s) in the source file. Basedupon the list, changes are applied to the software application.

When source code changes are made to a software system, the system andmethod of the present invention may be utilized to locate dependenciesto such source code changes. When a source code change is received, thebinary modules of the software system are parsed to determine whichbinary modules contain dependencies to the source code change. Thecorresponding source files of the binary modules are then retrieved. Thecompiler is used to identify each dependency present in the retrievedsource code.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a computer system in whichsoftware-implemented processes of the present invention may be embodied.

FIG. 2 is a block diagram of a software system for controlling theoperation of the computer system.

FIG. 3A is a block diagram of a Java development system suitable forimplementing the present invention.

FIG. 3B is a block diagram of a virtual machine illustrated in the Javadevelopment system of FIG. 3A.

FIG. 4 illustrates a preferred interface of a Java-based visualdevelopment or programming environment provided by the Java developmentsystem.

FIGS. 5A-B comprise a single diagram illustrating at a high level theprocesses involved in refactoring a software application using thesystem and method of the present invention.

FIGS. 6A-B comprise a single flowchart illustrating a compiler-assistedrefactoring method performed in accordance with the present invention.

DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT

The following description will focus on the presently-preferredembodiment of the present invention, which is implemented in a desktopapplication operating in an Internet-connected environment running undera desktop operating system, such as the Microsoft® Windows operatingsystem running on an IBM-compatible PC. The present invention, however,is not limited to any one particular application or any particularenvironment. Instead, those skilled in the art will find that the systemand methods of the present invention may be advantageously embodied on avariety of different platforms, including Macintosh, Linux, BeOS,Solaris, UNIX, NextStep, FreeBSD, and the like. Therefore, thedescription of the exemplary embodiments that follows is for purposes ofillustration and not limitation.

I. Computer-based Implementation

A. Basic system hardware (e.g., for desktop and server computers)

The present invention may be implemented on a conventional orgeneral-purpose computer system, such as an IBM-compatible personalcomputer (PC) or server computer. FIG. 1 is a very general block diagramof an IBM-compatible system 100. As shown, system 100 comprises acentral processing unit(s) (CPU) or processor(s) 101 coupled to arandom-access memory (RAM) 102, a read-only memory (ROM) 103, a keyboard106, a printer 107, a pointing device 108, a display or video adapter104 connected to a display device 105, a removable (mass) storage device115 (e.g., floppy disk, CD-ROM, CD-R, CD-RW, DVD, or the like), a fixed(mass) storage device 116 (e.g., hard disk), a communication (COMM)port(s) or interface(s) 110, a modem 112, and a network interface card(NIC) or controller 111 (e.g., Ethernet). Although not shown separately,a real-time system clock is included with the system 100, in aconventional manner.

CPU 101 comprises a processor of the Intel Pentium® family ofmicroprocessors. However, any other suitable processor may be utilizedfor implementing the present invention. The CPU 101 communicates withother components of the system via a bi-directional system bus(including any necessary input/output (I/O) controller circuitry andother “glue” logic). The bus, which includes address lines foraddressing system memory, provides data transfer between and among thevarious components. Description of Pentium-class microprocessors andtheir instruction set, bus architecture, and control lines is availablefrom Intel Corporation of Santa Clara, Calif. Random-access memory 102serves as the working memory for the CPU 101. In a typicalconfiguration, RAM of sixty-four megabytes or more is employed. More orless memory may be used without departing from the scope of the presentinvention. The read-only memory (ROM) 103 contains the basicinput/output system code (BIOS)—a set of low-level routines in the ROMthat application programs and the operating systems can use to interactwith the hardware, including reading characters from the keyboard,outputting characters to printers, and so forth.

Mass storage devices 115, 116 provide persistent storage on fixed andremovable media, such as magnetic, optical or magnetic-optical storagesystems, flash memory, or any other available mass storage technology.The mass storage may be shared on a network, or it may be a dedicatedmass storage. As shown in FIG. 1, fixed storage 116 stores a body ofprogram and data for directing operation of the computer system,including an operating system, user application programs, driver andother support files, as well as other data files of all sorts.Typically, the fixed storage 116 serves as the main hard disk for thesystem.

In basic operation, program logic (including that which implementsmethodology of the present invention described below) is loaded from theremovable storage 115 or fixed storage 116 into the main (RAM) memory102, for execution by the CPU 101. During operation of the programlogic, the system 100 accepts user input from a keyboard 106 andpointing device 108, as well as speech-based input from a voicerecognition system (not shown). The keyboard 106 permits selection ofapplication programs, entry of keyboard-based input or data, andselection and manipulation of individual data objects displayed on thescreen or display device 105. Likewise, the pointing device 108, such asa mouse, track ball, pen device, or the like, permits selection andmanipulation of objects on the display device. In this manner, theseinput devices support manual user input for any process running on thesystem.

The computer system 100 displays text and/or graphic images and otherdata on the display device 105. The video adapter 104, which isinterposed between the display 105 and the system's bus, drives thedisplay device 105. The video adapter 104, which includes video memoryaccessible to the CPU 101, provides circuitry that converts pixel datastored in the video memory to a raster signal suitable for use by acathode ray tube (CRT) raster or liquid crystal display (LCD) monitor. Ahard copy of the displayed information, or other information within thesystem 100, may be obtained from the printer 107, or other outputdevice. Printer 107 may include, for instance, an HP LaserJet® printer(available from Hewlett-Packard of Palo Alto, Calif.), for creating hardcopy images of output of the system.

The system itself communicates with other devices (e.g., othercomputers) via the network interface card (NIC) 111 connected to anetwork (e.g., Ethernet network, Bluetooth wireless network, or thelike), and/or modem 112 (e.g., 56 K baud, ISDN, DSL, or cable modem),examples of which are available from 3Com of Santa Clara, Calif. Thesystem 100 may also communicate with local occasionally-connecteddevices (e.g., serial cable-linked devices) via the communication (COMM)interface 110, which may include a RS-232 serial port, a UniversalSerial Bus (USB) interface, or the like. Devices that will be commonlyconnected locally to the interface 110 include laptop computers,handheld organizers, digital cameras, and the like.

IBM-compatible personal computers and server computers are availablefrom a variety of vendors. Representative vendors include Dell Computersof Round Rock, Tex., Compaq Computers of Houston, Tex., and IBM ofArmonk, N.Y. Other suitable computers include Apple-compatible computers(e.g., Macintosh), which are available from Apple Computer of Cupertino,Calif., and Sun Solaris workstations, which are available from SunMicrosystems of Mountain View, Calif.

B. Basic system software

Illustrated in FIG. 2, a computer software system 200 is provided fordirecting the operation of the computer system 100. Software system 200,which is stored in system memory (RAM) 102 and on fixed storage (e.g.,hard disk) 116, includes a kernel or operating system (OS) 210. The OS210 manages low-level aspects of computer operation, including managingexecution of processes, memory allocation, file input and output (I/O),and device I/O. One or more application programs, such as clientapplication software or “programs” 201 (e.g., 201 a, 201 b, 201 c, 201d) may be “loaded” (i.e., transferred from fixed storage 116 into memory102) for execution by the system 100.

System 200 includes a graphical user interface (GUI) 215, for receivinguser commands and data in a graphical (e.g., “point-and-click”) fashion.These inputs, in turn, may be acted upon by the system 100 in accordancewith instructions from operating system 210, and/or client applicationmodule(s) 201. The GUI 215 also serves to display the results ofoperation from the OS 210 and application(s) 201, whereupon the user maysupply additional inputs or terminate the session. Typically, the OS 210operates in conjunction with device drivers 220 (e.g., “Winsock”driver—Windows' implementation of a TCP/IP stack) and the system BIOSmicrocode 230 (i.e., ROM-based microcode), particularly when interfacingwith peripheral devices. OS 210 can be provided by a conventionaloperating system, such as Microsoft® Windows 9x, Microsoft® Windows NT,Microsoft® Windows 2000, or Microsoft® Windows XP, all available fromMicrosoft Corporation of Redmond, Wash. Alternatively, OS 210 can alsobe an alternative operating system, such as the previously-mentionedoperating systems.

The above-described computer hardware and software are presented forpurposes of illustrating the basic underlying desktop and servercomputer components that may be employed for implementing the presentinvention. For purposes of discussion, the following description willpresent examples in which it will be assumed that there exists at leastone computer running applications developed using the Java programminglanguage. The present invention, however, is not limited to anyparticular environment or device configuration. In particular, use ofthe Java programming language is not necessary to the invention, but issimply used to provide a framework for discussion. Instead, the presentinvention may be implemented in any type of system architecture orprocessing environment capable of supporting the methodologies of thepresent invention presented in detail below.

C. Java development environment

Java is a simple, object-oriented language which supports multi-threadprocessing and garbage collection. Although the language is based onC++, a superset of C, it is much simpler. More importantly, Javaprograms are “compiled” into a binary format that can be executed onmany different platforms without recompilation. A typical Java systemcomprises the following set of interrelated technologies: a languagespecification; a compiler for the Java language that produces bytecodesfrom an abstract, stack-oriented machine; a virtual machine (VM) programthat interprets the bytecodes at runtime; a set of class libraries; aruntime environment that includes bytecode verification,multi-threading, and garbage collection; supporting development tools,such as a bytecode disassembler; and a browser (e.g., Sun's “Hot Java”browser).

Shown in further detail in FIG. 3A, a Java development system 300suitable for implementing the present invention includes a client 310which employs a virtual machine 320 for executing programs. Inparticular, the client 310 executes a “compiled” (i.e., bytecode orpseudo-compiled) Java program 340, which has been created by compiling aJava source code program or script 305 with a Java compiler 330. Here,the Java source code program 305 is an application program written inthe Java programming language; the pseudo-compiled program 340, on theother hand, comprises the bytecode emitted by the compiler 330. Thevirtual machine 320 includes a runtime interpreter for interpreting theJava bytecode program 340. During operation, the client 310 simplyrequests the virtual machine 320 to execute a particular Java compiledprogram.

Also shown at FIG. 3A is the modification of the development system 300to implement the present invention. As shown, a refactoring module 335is provided for compiler-assisted refactoring of a software program. Thecompiler-assisted refactoring operations of the system are described indetail below.

As shown in FIG. 3B, the virtual machine 320 comprises a class loader321, a bytecode verifier 322, a bytecode interpreter 323, and runtimesupport libraries 324. The class loader 321 is responsible for unpackingthe class file which has been requested by a client. Specifically, theclass loader 321 will unpack different sections of a file andinstantiate in-memory corresponding data structures. The class loaderwill invoke itself recursively for loading any superclasses of thecurrent class which is being unpacked.

The bytecode verifier 322 verifies the bytecode as follows. First, itchecks whether the class has the correct access level. Since the classwill access other classes for invoking their methods, the bytecodeverifier 322 must confirm that appropriate access is in place.Additionally, the bytecode verifier confirms that the bytecode whichcomprises the methods is not itself corrupt. In this regard, thebytecode verifier confirms that the bytecode does not change the stateof the virtual machine (e.g., by manipulating pointers).

Once the bytecode has been verified, a “class initializer” method isexecuted. It serves, in effect, as a constructor for the class. Theinitializer is not a constructor in the sense that it is used toconstruct an instance of a class—an object. The class initializer, incontrast, initializes the static variables of the class. These staticvariables comprise the variables which are present only once (i.e., onlyone instance), for all objects of the class.

Runtime support libraries 324 comprise functions (typically, written inC) which provide runtime support to the virtual machine, includingmemory management, synchronization, type checking, and interfaceinvocation. At the client machine on which a Java application is to beexecuted, runtime support libraries 324 are included as part of thevirtual machine; the libraries are not included as part of the Javaapplication. The bytecode which is executed repeatedly calls into theruntime support libraries 324 for invoking various Java runtimefunctions.

In the currently preferred embodiment, the Java development system 300may be provided by Borland JBuilder® 7.0, available from BorlandSoftware Corporation of Scotts Valley, Calif. Further description of thedevelopment system 300 may be found in “Building Applications withJBuilder (Version 7),” available from Borland Software Corporation ofScotts Valley, Calif., the disclosure of which is hereby incorporated byreference. The following briefly describes the Java-based visualdevelopment interface provided by the system.

D. Development Interface

FIG. 4 illustrates a preferred interface of a Java-based visualdevelopment or programming environment 460 provided by the system. Asshown, the programming environment 460 comprises a main window 461, aproject pane 471, a structure pane 475, and a content pane 481 (showingthe editor). The main window 461 itself includes a main menu 462 and amain toolbar 463. The main menu 462 lists user-selectable commands, in aconventional manner. For instance, the main menu 462 invokes “File”,“Edit”, “Search”, “View” submenus, and the like. Each submenu listsparticular choices which the user can select. Working in conjunctionwith the main menu, the main toolbar 463 provides the user withshortcuts to the most common commands from the main menu, such asopening or saving a project. The main toolbar 463 is displayed under themain menu and is composed of smaller toolbars grouped by functionality.The main toolbar is configurable by the user for including icons formost of the menu commands.

To develop a software program in the development environment, a usertypically first creates a “project” to organize the program files andmaintain the properties set for the program. The project pane 471contains a list of the open project(s) and a tree view of the contentsof the active project. As shown at FIG. 4, the active project file isthe top node in the project pane 471 and the content pane 481 displaysthe contents of the active project file. In the currently preferredembodiment, the project pane 471 also includes a project pane toolbar472 which includes buttons for closing a project, adding files orpackages (e.g., by opening an “Add Files or Packages to Project” dialogbox), removing files from a project, and refreshing the project (e.g.,searching for source packages for the project).

The structure pane 475 displays the structure of the file currentlyselected in the content pane 481. The file structure is displayed in theform of a tree showing the members and fields in the selected file. Whenappropriate, the structure pane 475 also displays an “Errors” folder(not shown) containing any syntax errors in the file as well as an“Imports” folder (as shown at the top of the structure pane 475)containing a list of imported packages. In addition to providing a viewof the structure of the class, the structure pane facilitates navigatingto a class, or its methods or members, in the source code.

The content pane 481 displays all open files in a project as a set oftabs. Files may be opened in the content pane 481 by selecting the filefrom the project pane 471. The name of each open file is displayed onfile tabs 482 at the top of the content pane. As shown, multiple filetabs 482 may provide access to various open files. A user may select afile tab (e.g., the “Welcome Frame” as shown at FIG. 4) to display aparticular file in the content pane 481. The content pane provides afull-featured editor that provides access to text (i.e., source code) ina given project.

The content pane 481 provides access to various file views as well asstatus information by way of file view tabs 485 and a file status bar486. Each of the file view tabs 485 shown at the bottom of the contentpane provides a different view of the open file. The file view tabs 485are context sensitive. Only tabs appropriate to the file open in thecontent pane appear below its window. For instance, a visuallydesignable .java file typically has several tabs, including “Source”,“Design”, “Bean”, “UML”, “Doc”, and “History” as shown at FIG. 4. A usermay select the “Source” tab to view source code or the “UML” tab to viewUniform Modeling Language (UML) diagrams for a class or package. Thecontent pane 481 also includes a file status bar 486 which is displayedimmediately above the file view tabs 485. The file status bar 486displays information specific to the current file, such as the name ofthe file, the cursor location (line number and column), and theinsertion mode in a text file.

The following description will focus on those features of thedevelopment system 300 which are helpful for understanding themethodology of the present invention for compiler-assisted refactoringof a software application.

II. Compiler-assisted Refactoring

A. Overview of compiler-assisted refactoring

The present invention provides native-compiler support for refactoring.Here, the approach is to use the underlying development system's actualcompiler (e.g., JBuilder's actual Java compiler) to automate andfacilitate refactoring of software systems or applications, instead ofusing a special-purpose diminutive compiler. In this manner, therefactoring implementation may employ the underlying developmentsystem's own proven, robust compiler. As previously described, duringthe process of making structured changes to a software system (i.e.,refactoring), a user must identify all instances in which a particularterm or symbol is used as well as the context in which such term orsymbol is used. The system and methodology of the present inventionassists a user in properly parsing and understanding the context inwhich particular terms or symbols are utilized, thereby enablingrefactoring to be performed more efficiently and reliably.

A large software application typically involves tens or hundreds ofthousands of lines of code. A system of this size is generally composedof a large number of smaller components. For example, a system orapplication written in the Java programming language typically includesa large number of different packages and classes that are contained invarious different files. The various component packages and classes arelinked together and compiled into binary form in order to implement andrun the software system (e.g., on a Java virtual machine). Accordingly,an initial problem to be addressed in changing the system (i.e.,refactoring) is to locate the source files that contain the term(s) orsymbol(s) of interest. For example, a user may wish to rename aparticular class written in the Java programming language, or one ormore of its elements.

The system of the present invention enables the source files containingthe dependencies (e.g., the targeted class) to be identified byevaluating the binary output of a compilation of the system. In thisexample involving a particular Java class, this evaluation process looksfor references to the targeted class, enabling identification of one ormore source candidate files which may contain the items of interest.These source candidate files may then be further inspected using thecompiler constructed in accordance with the present invention, ashereinafter described. In this first step of identifying the targetsource files from a binary representation of the system, the debuginformation contained in the class files is used to identify sourcecandidates. After a list of source candidates has been identified fromthe binary representations, the compiler is then used to narrow the listof source candidates by identifying references to a certain class asdiscussed below.

After the candidate source files have been identified, the system of thepresent invention is used to generate a syntactical representation ofthe candidate source files, in the form of a “parse tree”. Next, thecompiler's typing phase is used to generate an attributed (or annotated)parse tree which contains annotations to the references. The annotatedparse tree not only contains the short name references to the object orsymbol of interest, but also associated type information to betterenable a determination to be made about whether a particular referenceis, in fact, a symbol of interest. The short name reference indicates apotential match, however it generally cannot be determined whether ornot it is in fact a match without examination of the context. After theannotated parse tree has been generated, it may then be traversed tolocate references to particular symbols or objects of interest andevaluate not only the short name, but also the associated typeinformation. The annotated parse tree also includes position informationto facilitate locating and making changes to the applicable source file.In effect, the parse tree contains a mapping to the text (i.e., thesource code) that can be used to make changes to the text file.

B. Summary of basic approach

The approach of the present invention for improved refactoring may besummarized as follows. When a developer or user compiles the source codeof a software system, the resulting object files contain a compactrepresentation of the usage of symbols inside the source files. In thecase of a system written in Java, the text or character names of symbolsgenerally survive compilation, which is not the case in many otherprogramming languages. Additional dependencies that are contained in thesource files, but are not part of the object files are also saved inauxiliary files alongside the object files for later retrieval. Thisinformation typically consists of declared but unused program entitiesthat may have to be updated in case of a refactoring.

The following pseudocode is a high level description of how the systemof the present invention discovers candidate source files forrefactoring using the repository and then uses the system's compiler tofind the actual occurrences of the symbols or object of interest withinthe candidate source files.

 1: given an object (method, field, class) OBJ, which is defined in  2:class CLASSNAME:  3:  4: public void findAllOccurences(Obj obj, Stringclassname) {  5: // get a instance of Repository,  6: // which is adatabase containing dependencies  7: // between classes  8: Repositoryrepository = ((JBProject)project).getRepository();  9: // get thedatabase entry for class classname 10: ClassEntry classEntry =repository.getClassEntry(classname); 11: // get the reversedependencies, giving us a list 12: // of candidate classes that areusing class classname 13: Collection candidates =classEntry.getReverseDependencies(); 14: // Get a set ofsourceCandidates 15: Set sourceCandidates = new Set(); 16: for (Iteratori = candidates.iterator(); i.hasNext();) { 17: ClassEntry candidate =(ClassEntry)i.next(); 18: SourceEntry sourceCandidate =candidate.getSource(); 19: sourceCandidates.add(candidate.getSource());20: } 21: 22: // go through all source candidates to determine if 23: //any of them use the object obj 24: for (Iterator i =sourceCandidates.iterator(); i.hasNext();) { 25: SourceEntrysourceCandidate = (SourceEntry)i.next(); 26: doSearch(sourceCandidate,Obj obj); 27: } 28: } 29: 30: // find all occurences of obj withinsource file source 31: void doSearch(SourceEntry source, Obj obj) { 32:// attribute the source. This will use the compiler 33: // to resolveall names within that source to accurately 34: // determine occurencesof obj 35: source.attribute(); 36: 37: AST[ ] result =source.findObjOccurences(obj); 38: for (int i = 0; i < result.length;i++) { 39: reportMatchFound(source, result [i]); 40: }

As illustrated in the above pseudocode, for a given symbol or object(obj) from an object file (e.g., a .class file), the method of thepresent invention locates all positions in the source files where thesymbol is used. As shown at line 8, an instance of the repository, whichis a database containing relationships between classes, is instantiated.At lines 10-19, the set of candidate source files that have a dependencyon the identified class is determined from the repository. After thecandidate source files have been identified, these source files arefurther examined to determine which of these files have a dependency onthe given object as shown at lines 22-26. For each occurrence of theobject in a candidate source file, the compiler is used to attribute thesource as provided at lines 30-35. The result is an annotated list ofall occurrences of the object inside the source files where the objectis used. The processes involved in the refactoring of a softwareapplication using the system and method of the present invention willnow be described in greater detail.

C. Processes involved in refactoring an application

FIGS. 5A-B comprise a single diagram 500 illustrating at a high levelthe processes involved in refactoring a software application using thesystem and method of the present invention. The processes involved incompiling a software application into binary format for execution on acomputer are first described. The processes involved in refactoring theapplication are then illustrated in typical sequential order. Thefollowing discussion will use as an example the refactoring of asoftware application written in the Java programming language. However,the reference to Java is only to provide a framework for discussion andthe system and method of the present invention may also be used forrefactoring of software systems written in a variety of programminglanguages.

As shown, the process begins at block 501 with one or more source filesor listings of a software system. The source files may, for instance,comprise a particular software application that has been developed toperform particular tasks (e.g., a transaction processing system). Thesource files may have been developed using a visual development systemsuch as JBuilder. Alternatively, the source files may be developed usinga text editor or another type of development tool. After the sourcefiles for the system have been developed, the developer compiles thesource files using a complier. The compiler takes the source files and“compiles” or generates binary modules or files from the source files asshown at block 502 at FIG. 5A.

Compilation of the source files typically involves several relatedoperations. First, the input stream is scanned to break the source codefiles into a sequence of tokens or meaningful groups of characters.After scanning, the sequence of tokens is parsed to generate an abstractsyntax tree or “parse tree” representation of the source code. Theparsing process also generally includes resolution of symboldeclarations as well as semantic analysis to verify the source code as asequence of valid statements or expressions in the applicableprogramming language. The compiler also builds a symbol table and othersupporting data structures for annotating each node in the parse treewith parse or type information. The output from these scanning andparsing operations is a parse tree in which nodes are annotated witheither type or symbol information. Following these scanning and parsingoperations, the annotated parse tree is usually optimized by a codeoptimizer to optimize data references globally within a program. Afteroptimization, a code generator generates instructions or binary code forthe target processor. Code generation may also include additionalmachine-dependent optimization of the program. Following compilation,the object code may also be “linked” or combined with runtime libraries(e.g., standard runtime library functions) to generate executableprogram(s), which may be executed by a target processor (e.g., processor101 of FIG. 1). The runtime libraries include previously compiledstandard routines, such as graphics, input/output (I/O) routines,startup code, math libraries, and the like. The result of the aboveprocess is that the high level source code files have been translatedinto machine readable binary code which may then be executed.

As shown at block 503, the compiled binary modules or files (e.g.,.class files in the case of a program written in Java) represent anexecutable software application that may then be deployed and operated.After the software system has been compiled, a user may, from time totime, wish to make changes to the system (e.g., to add new features orfor other reasons). When changes are to be made to the softwareapplication, the user may utilize the system and methodology of thepresent invention for compiler assisted refactoring. In response to arequest from a developer or user for refactoring of a program or system,the binary modules (e.g., .class files) are read, processed, and placedinto a repository as shown at block 504. As part of this process, thebinary modules are parsed and a cross reference list is created. Thecross reference list contains a list of references to the given binarymodules in other binary modules (e.g., other .class files) of theapplication. The cross reference list is stored in the repositorytogether with the binary modules. The symbols or references contained ina given class file are discerned or determined. This referenceinformation is then entered in the repository, so that for a givensource file it can be determined what high level cross references (e.g.,dependencies among Java .class files) are made among the componentsource files (e.g., .java files) of the system. This repositoryinformation may then be used to determine the list of source candidatesto be examined as described below.

After reference information has been determined, the list of forwardreferences for a given class which are stored in the repository may beused to obtain a list of the other classes that refer to such givenclass as shown at block 505 at FIG. 5. By reversing all of the forwardreferences to a particular class, all of the classes that are dependenton this particular class may be determined. For example, if one wishesto change the name of a particular .class file, this information enablesall other .class files that are using (or referencing) this particularfile to be identified. In the currently preferred embodiment in the Javaprogramming language, the reference information stored in the repositoryincludes the class name, the source name, and all the .class files thatare used by this class (i.e., all of the forward references). In thismanner, the candidate source files which are using this class may bedetermined and retrieved as provided at block 506.

The candidate source files are then fed into the compiler. As shown atblock 507, the compiler's parser is used to create “parse tree”representations of these candidate source files. The parse trees thatare created include position information for each node of the parsetrees indicating the location in the source file of a given symbol orreference. The compiler's type attribution is also used to determinetype information for such reference as shown at block 508. After theannotated parse trees containing type information and line informationhave been generated, the parse trees are traversed to examine eachinstance of a particular symbol or element of interest (i.e., eachrelevant node of the parse tree) as shown at block 509. Based upon thetext name and the type information of a given symbol in the annotatedparse trees, it is determined as to whether or not that particularsymbol (or element) is a dependency that needs to be changed as part ofthe refactoring process. If this information indicates that a particularelement needs to be modified, the position information is used (i.e.,the position information in the source file annotated on a parse treenode) to locate that element in the source file. The source file is thenretrieved and the change is applied to that particular element of thesource file as shown at block 510. Additional elements or symbols maythen be examined and modified in a similar fashion until the refactoringprocess has been completed. After the refactoring process has beencompleted, the modifications can be saved as shown at block 511. Formost types of refactorings, the currently preferred embodiment providesthe user with the option to view the modifications made during therefactoring process before saving the modifications. After themodifications have been saved, the user may also recompile the programto verify successful implementation of the modifications, if desired.

D. Refactoring Code Symbols

1. Types of Refactoring Supported

The currently preferred embodiment of the present invention supports anumber of different types of refactorings, including “optimize imports”,“rename refactoring”, “move refactoring”, “change parameters”,“introduce variable”, “extract method”, and “surround with try/catch”.Each of these different types of refactorings will now be brieflydescribed.

a) Optimize imports

An “optimize imports” refactoring rewrites and reorganizes importstatements according to the custom settings established in theproperties for a project (e.g., properties established by the developeror user of a particular application). An optimize imports refactoringalso removes any import statements that are no longer used in the code.

b) Rename refactoring

A “rename refactoring” applies a new name to a package, class, method,field, local variable, or property, ensuring that all references to thatname are correctly handled. Rename refactoring a constructor renames theclass. Rename refactoring is far more than a search and replace task;references must all be accounted for and properly handled while patternsmust be recognized so that overloaded names are handled correctly. Forexample, when a rename refactoring is performed on an overloaded classname, the class's new name must be reflected in the class declarationand in every instance of that class and every other reference to thatclass. However, the new name must only be reflected in the target class,not in the other classes that share its original name or theirdeclarations, instances, references, etc.

For packages, rename refactoring renames the specified package. Packageand import statements in class files are updated. The package,subpackages, and class source files are moved to the new sourcedirectory and the old directory is deleted. In the currently preferredembodiment, the following code symbols can be rename refactored.

Code Refactoring Symbol allowed Description Package Package Renamerefactoring a package renames the package and the entire sub-tree ofpackages. The package name cannot already exist in the project. Class,Rename, Rename refactoring an outer public class inner Move renames thesource file. If the source file class or name already exists in thecurrent package, interface the refactoring is prevented. If the class isnot the outer public class and there is another class of the desired newname, the class is not renamed. Move refactoring a class moves thatclass to a new package if the new package does not already contain asource file of the new name. The class must be the top level publicclass. Method Rename Rename refactoring a method renames the method andall references to that method. The method can be renamed in all classesthat this class inherits from or in all classes in the hierarchy for theclass. A forwarding method can be created. Field Rename Renamerefactoring a field renames the field to a new name. The new name cannotalready exist in the class that declared the original name. If there arescope conflicts between the new name and the old name, then the thiskeyword is added to the beginning of the new field name. A warning isdisplayed if the new name overrides or is overridden by an existingfield in a superclass or subclass. Local Rename Rename refactoring alocal variable renames Variable the variable to the new name. The newname cannot already exist in the class that declared the original name.Property Rename Rename refactoring a property renames the property aswell as its getter and setter. The new name cannot already exist in theclass that declared the original name.

c) Move refactoring

A “move refactoring” is available for moving classes. Move refactoringmoves a specified class to a new package. Move refactoring is onlyallowed on a top-level public class. The package the class is beingmoved to cannot already contain a source file of the new name. Therefactoring must update the declaration of the class, as well as all theusages of that class.

d) Change parameters

A “change parameters” refactoring allows a user to add, rename, deleteand re-order a method's parameters. A newly added parameter may beedited before the dialog creating the parameter is closed; however, anexisting parameter cannot be edited.

e) Extract Method

An “extract method” refactoring turns a selected code fragment into amethod. The system moves the extracted code outside of the currentmethod, determines the needed parameter(s), generates local variables ifnecessary, and determines the return type. It inserts a call to the newmethod in the code where the code fragment resided.

f) Introduce Variable

An “introduce variable” refactoring allows the result of a complexexpression, or part of the expression, to be replaced with a temporaryvariable name that explains the purpose of the expression orsub-expression.

g) Surround with try/catch

A “surround with try/catch” refactoring adds a try/catch statementaround a selected block of code. It detects all checked exceptions in ablock and adds specific blocks for each checked exception.

2. Access to Refactoring Features

The refactoring features of the currently preferred embodiment areaccessible from the development system's editor context menu, editormenu, search menu, and UML diagram context menu. Before a refactoring, auser can view, by category, all locations in the current project wherethe selected symbol is referenced, and navigate to the symbol'sdefinition. If the refactoring cannot be completed, the user interfaceprovides warning and error messages to help explain the problem. In thecurrently preferred embodiment, warnings do not stop the refactoring.However, if an error is encountered, the refactoring is prevented. Forexample, a refactoring might be prevented if a file is read-only or if anew name that is selected already exists. Single file refactorings (forexample, extract method and introduce variable) do not display outputunless there are errors or warnings.

Exemplary refactoring tools provide extensive information to the userabout the refactoring process, including:

Limitations Checks for conditions where a refactoring reporting mightencounter problems. For example, determines if needed dependencyinformation is missing or out-of-date, if a file is read-only, or if aclass file does not exist. References Finds all source files containingdependencies. discovery The exact source position is located. ValidationDetermines if the new name is legal. For example, the name might alreadybe in use or contain illegal syntax. Source tree Physically moves adirectory or a file updating within the source tree for a class moverefactoring or a package rename refactoring. The system also updatesimport statements as needed for any dependencies. Reference Renamesreferences with the new name. renaming

3. Setting up for References Discovery and Refactoring

There are several steps that a user may need to perform beforeperforming a refactoring. The currently preferred embodiment provides anoption for loading all library relationships, allowing the system todiscover all references. To find all references to a symbol, the systemshould be compiled with this option for loading references from projectlibraries enabled. Loading all library references is not required; itmay slow down both the compilation and the refactoring process. However,if the library references are not loaded, all references to a symbol maynot be discovered using the “Find References” command described below.Additionally, before performing a refactoring, it is advisable to ensurethat the class files are up-to-date by compiling the source files.

4. Learning About a Symbol Before Refactoring

Before a user performs a refactoring, such as a rename refactoring or amove refactoring, the currently preferred embodiment provides severalways for learning about a given symbol. For example, a developer can usethe system to find the definition of a symbol as well as all referencesto the symbol; that is, all source files that use the symbol.

A user can also issue a “Find Definition” context menu command todetermine where a given symbol is defined. To find a symbol'sdefinition, a user is required to compile the application before usingthe “Find Definition” command to locate the file containing thedefinition. In addition, the class that includes the definition must beon the import path or in the same package as the symbol that is ofinterest. In response to the “Find Definition” command, the source filewhere the symbol is defined is opened in the editor of the visualdevelopment system. If the symbol is an instance of a class, the classis opened in the editor, with the cursor placed on the classdeclaration. If the symbol is a method, the class that defines themethod is opened in the editor, with the cursor placed at the start ofthe method signature. If the symbol is a variable, and the variable isdefined in the open class, the cursor moves to the variable definition.If the variable is public and defined in another class, the class isopened in the editor with the cursor placed on the definition.

Before refactoring, a developer might also want to find all source filesusing a selected symbol. To locate all references to a symbol, theapplication must be compiled and the references must be loaded fromproject libraries as previously described. If a developer has compiledthe source files and loaded the references, he or she may select thesymbol in the editor and enter the “Find References” command. Referenceslocated in response to the command are displayed on the “Search Results”tab of the message pane. Class and method references are sorted bycategory. Field and local variable references are sorted by file name.The “Find References” command cannot currently be used to locatereferences for a package or a property. The following table details, bycode symbol, the reference categories that can be displayed in theSearch Results tab:

Code Symbol Reference Category Class, Ancestors - Classes from whichthis class directly inherits. inner class Descendents - Classes thatdirectly descend from this class. or Type references - Classes thatdeclare or instantiate the interface type of object for the class.Descendents type references - Classes that are descendents or usedescendents of the type of object for the class. Member references -Members in this class. Descendents member references - Members inclasses that descend from this class. Method Declarations - Locationswhere this method is declared. or Direct usages - Locations in directlyinstantiated classes constructor that call this method. Indirectusages - Locations in ancestor and descendent classes that arereferenced. Field and Writes - Locations where the field or localvariable local is written. variable Reads - Locations where the field orlocal variable is read.

After references for a class have been located, a reference category inthe Search Results tab may be selected to obtain additional informationabout that reference, including a list of the source files referring tothe class. A user may then select a source file and a reference to godirectly to the reference in the editor. Additional information on amethod may be obtained in a similar fashion. For field or local variablereferences, the writes and reads for the selected symbol may bedisplayed.

5. Viewing Changes Before Refactoring

Before some types of refactoring are performed, the system provides theopportunity to view potential changes to be made before committing therefactoring. A user may wish to preview changes to ensure that thechanges that will be made are appropriate before committing the changes.For instance, when the rename or move commands on the context menu areutilized, rename or move dialog boxes are displayed. These dialog boxesprovide an option for viewing changes before committing them. If thispreview option is selected, potential changes are displayed on the“Refactoring” tab of the message pane. The lines that will potentiallychange as a result of the refactoring are displayed by file name, sortedin the order of discovery. The following table details the type ofinformation displayed for refactoring:

Code Type of Information Symbol refactoring displayed Package RenameSource files that contain a class reference that will change Class,Rename Line locations in the current source file where the inner classis declared; includes constructors. Also class lists source codelocations where the class or is used. interface Class Move Source codelocations where the class' current package is declared or imported.Indicates if a package in the list of imports is added or deleted. (Animport statement is added for any dependencies the class has on thepackage it is being moved from). Method Rename Source code locationswhere the method is declared and used. Indicates if a forwarding methodis created. Field Rename Source code locations where the symbol is anddeclared and called. local variable Property Rename Source codelocations where the property is declared and where accompanying getterand setter are declared and called.

A mapping is also provided which enables a user to go directly to aselected reference in a source file. After reviewing the changes, a usermay issue a command (using the Refactoring tab) to commit the changesand complete the refactoring. A status bar displays a message providinginformation about the progress of the refactoring. If the files areedited before the refactoring is complete, the system may require theapplication to be recompiled, since the binary files (e.g. .class files)and the source files (i.e., .java files) would not be consistent. Inthis event, a status message is displayed indicating that therefactoring cannot be completed because the files have changed. When therefactoring is completed, a message is displayed indicating that therefactoring is complete.

After refactoring, the contents of the Refactoring tab do not change.The original lines of source code are still displayed, so that thechanges made by the refactoring can be observed. A user may select anoriginal source code line to go to the line that was changed. After thechanges have been reviewed, they may be saved to make them permanent.

6. Performing Different Types of Refactoring

a) Optimize imports

An optimize imports refactoring may be used to rewrite and reorganizeimport statements according to custom settings in the projectproperties. A user can customize the order of imports on the importstyle page of the “Project Properties” dialog box of the currentlypreferred embodiment. To customize the order of import statements fornew projects, a “Project|Default Project Properties” dialog box may beused to make modifications to the import style page.

Different import style options may be set for the refactoring. An“Always Import Classes” option may be selected to avoid adding packageimport statements to an application. If this option is selected,individual classes will be directly imported. When this option is used,the “Package Import Threshold” is ignored. The “Package ImportThreshold” sets how many classes must be imported from a package beforerewriting the class import statement into a package import statement.Classes up to the import threshold are imported using individual classimport statements. When the threshold is exceeded, the entire package isimported. For example, when three is entered in the import thresholdfield, and four or more classes are used from a package, the entirepackage will be imported.

b) Rename refactoring a package

A package may also be rename refactored from the editor or a UML classor package diagram. Rename refactoring a package renames the package andthe entire subtree of packages to the new root package name. It alsomoves the package and all class names to the new name and sourcedirectory. A user may elect to view references before committing therename refactoring of a package. After the refactoring is completed, theexisting source directory structure for that package is deleted. Thepackage rename refactoring is prevented by the system if the new packagename already exists or is invalid.

c) Rename refactoring a class

A class, inner class, or interface can be rename refactored from eitherthe editor or the UML class diagram. Rename refactoring for an outerpublic class renames all declarations of the class and all usages of theclass and the source file. If a constructor is selected, the renamerefactoring renames the class. The changes to be made in the refactoringmay be previewed as described above, if desired. The rename refactoringis prevented by the system if the class identifier is invalid. Inaddition, if the class is not the outer public class and there isanother non-outer public class of the desired new name, the class is notrenamed.

d) Move refactoring a class

A class can be moved to a new package (i.e., move refactored) using theeditor or the UML class diagram. Move refactoring a class moves thatclass to a new package if the new package does not already contain asource file of the new name. The package and import statements in theclass source file, as well as in all classes that reference the movedclass, are updated. An import statement is added for any dependenciesthe class has on the package it is being moved from. The class beingmoved must be the top level public class. The class is not moved if theclass identifier is invalid or if the source file name already exists inthe new package. If a class is moved to a package that does not exist,the new package is automatically created and added to the application orproject. The system also creates the new source directory and moves theclass to the new directory. Package names and import statements are alsoupdated. Additionally, if the package no longer contains any classes,the package is removed from the project and its source directory isdeleted.

e) Rename refactoring a method

A method may also be rename refactored. Rename refactoring of a methodmay be initiated from either the editor or a UML diagram of thedevelopment system. Rename refactoring a method renames the method, alldeclarations of that method, and all usages of that method. The methodcan be renamed in the entire hierarchy or from the selected class downin the hierarchy. A forwarding method, that passes on the method call toa new method, can also be created using a “Create Forwarding Method”option, allowing a public API to remain intact. Rename refactoring amethod does not rename overloaded methods; that is, methods with thesame name but with different method signatures. A “Refactor Ancestors”option, when enabled, renames methods in classes that the current classinherits from. This option may be deactivated to rename the method onlyin this class and in its descendents. Rename refactoring of a method isbarred if the new method name already exists in the file where it isdeclared. If the name exists in other files in the direct inheritance, awarning is issued. If the refactoring is performed with the RefactorAncestors option enabled, a warning is also be displayed if the methodexists, but is not in the editable source path. For example, if themethod exists in a library, the method will not be refactored, aslibraries are read-only.

f) Rename refactoring a local variable

In the currently preferred embodiment, a local variable can be renamerefactored only from the editor. A local variable rename refactoringchanges the declaration and usages of that variable to the new name. Amethod parameter is also treated as a local variable for these purposes.The rename refactoring is prevented if the new name exists in the classthat declared the original variable.

g) Rename refactoring a field

A field can be rename refactored from either the editor or a UML classdiagram. A field rename refactoring changes the declarations and usagesof that field to the new name. The refactoring may not be completed ifthe new name exists in the class that declared the field. If there arescope conflicts between the new name and the old name, the this keywordis added to the new field name. A warning is displayed if the new fieldoverrides or is overridden by an existing field in a superclass orsubclass.

h) Rename refactoring a property

A property can be renamed from a UML class diagram. A property renamerefactoring changes all declarations of that property, as well as itsgetter and setter methods. A rename refactoring of a property cannot becompleted if the new name exists in the class that declared the originalproperty.

i) Changing method parameters

A user can also add, rename, delete, and reorder a method's parametersfrom the editor or from a UML diagram. A newly edited parameter can beedited before the “Change Parameters” dialog box is closed; however anexisting parameter cannot be edited. The “Refactor Ancestors” option (onby default) refactors methods in classes from which this class inherits.The Refactor Ancestors option may be deactivated to refactor the methodonly in this class and in its descendents. A user can then choose to adda forwarding method by clicking the “Create Forwarding Method” option. Achanging message parameters refactoring is prevented if the new methodsignature already exists in the file where it is declared. If thesignature exists in other files in the direct inheritance, a warning isissued. If the refactoring is performed with the Refactor Ancestorsoption enabled, a warning can also be displayed if the same methodexists, but is not in the editable source path. For example, if themethod exists in a library, it will not be refactored, as libraries areread-only. In addition, the refactoring may be prevented if the newparameter name or type is not a valid Java identifier.

j) Extracting a method

An extract method refactoring turns a selected code fragment into amethod. A user can access this refactoring from the editor. Theextracted code is moved outside of the current method, the neededparameter(s) are determined, local variables are generated if necessary,and the return type is determined. A call to the new method is alsoinserted in the code where the code fragment resided. This refactoringmay not be allowed if more than one variable is written to or if it isread after the block.

k) Introducing a variable

An introduce variable refactoring may be used to replace the result of acomplex expression, or part of the expression, with a temporary variablename. The temporary name is also known as an explaining variable whichexplains the purpose of the expression or sub-expression. A temporaryvariable with the selected variable name is generated and initialized inthe correct place. The original expression is replaced with the newlygenerated variable.

l) Surrounding a block with try/catch

A user can perform a surrounding a block with try/catch refactoring toplace a try/catch statement around a selected block of code. The systemwill detect all checked exceptions in a block and adds specific blocksfor each checked exception. This refactoring is available from theeditor. If the selected block is not a valid block of statements, anerror will displayed in the refactoring tab and the refactoring will beprevented.

7. Undoing a Refactoring

A completed refactoring operation can be easily reversed. In thecurrently preferred embodiment, an “Undo” button is provided on therefactoring toolbar to undo all changes made in a refactoring. Therefactoring should be reversed before any changes are made to otherfiles and before the Refactoring tab is closed. When a refactoring isperformed that does not display output in the Refactoring tab, changescan be reversed with an “Edit|Undo” command.

8. Saving Refactoring

After a refactoring has been successfully completed, the changes may besaved (i.e., the modified files in the software system may be saved)using a “File|Save All” command. If a version control system isutilized, the changes may be committed or checked into the versioncontrol system. If the software system is closed before the refactoringchanges are saved, a “Save Modified Files” dialog box is displayedenabling the files that are to be saved to be selected. If therefactored files are not saved, the software system source files revertto their state before the refactoring(s). It should be noted thatrefactoring can be applied to files that may not be open in the editorat the time of the refactoring. The system automatically saves changesto those files so the source code is not in an inconsistent state.

E. Detailed methods of operation

FIGS. 6A-B comprise a single flowchart illustrating a compiler-assistedrefactoring method 600 performed in accordance with the presentinvention. To illustrate the operation of the present invention, FIGS.6A-B and the following discussion use as an example a refactoring of asoftware application written in the Java language containing a number ofcomponent source files or programs. However, Java is only one of thepossible programming languages with which the present invention may beadvantageously utilized. Accordingly, the references to Java in thefollowing discussion are for purposes of illustration and notlimitation. In addition, refactoring of a software system may involveapplying a number of different changes to the system (e.g., changing anumber of different symbols or class names). Accordingly, the stepsdescribed below for making a given change may be repeated for applying anumber of different changes to a software system, as desired.

The method begins at step 601 with the receipt of one or more sourcefile(s) that have been developed or created to perform particular tasks.The source files may, for instance, comprise .java files for a softwareapplication that has been developed in the Java programming language forinstallation on a particular environment (e.g., an e-commerceapplication to be installed on a Web server). At step 602, the compileris initially invoked to compile these source file(s) into a set ofbinary files (e.g., .class files). The compilation process includesparsing the source files, applying type attribution, and generatingbinary code (e.g., Java bytecode) as previously described. It should benoted that in the case of Java source files, the name of the sourcefiles (e.g., class names) as well as other reference information isretained as part of these compiled .class files. The result of thecompilation process is that the source files (e.g., .java files) havebeen translated into machine readable binary code (e.g., .class files)which may then be executed.

After the source file(s) have been compiled, a user may subsequentlywish to make changes to the application. For instance, a user may wantto perform a rename refactoring to change a particular class name of acomponent of the application. At step 603, a request is received from auser for refactoring of a program or system. In response to thisrequest, at step 604 the system of the present invention reads andparses the binary modules of the application in order to place entriesinto a repository. In particular, the above binary files (e.g., .classfiles) of the application developed or created by the user are examinedand the system of the present invention generates information about eachbinary file (e.g., .class file) and places this information in therepository. In the currently preferred embodiment, information that isplaced into the repository includes the element name (e.g., the classname), the source file, and the forward references from this class toother classes. The above process examines the user-developed .classfiles and does not examine standard libraries and other standard filesor components provided as part of the underlying Java programminglanguage.

After the user source files of the application have been examined andentries have been placed in the repository describing each of thecomponents of the application, at step 605 the information in therepository is used to identify binary files which may containdependencies (i.e., which may contain the elements or symbols ofinterest such as the class name in this example). This process involvesusing the forward references in the repository to isolate candidatebinary files (e.g., .class files) which may contain dependencies. Atstep 606, the corresponding source files are retrieved for the candidatebinary files identified at step 605.

After the candidate source files have been retrieved, at step 607 thesesource files are read into the compiler. The compiler is used to parsethe candidate source files (i.e., those identified at step 606) andcreate parse trees representations of these source files. The parse treerepresentations of the candidate source files that are generated containshort names (e.g., text names) and position information which referencesthe locations of these short names (or symbols), but do not yet containtype information. At step 608, the compiler's type attribution is usedto annotate nodes of these parse trees with type information. Moreparticularly, each node of the parse tree includes a field for storingtype information. The compiler adds type information to nodes of theparse tree by traversing (or walking) the parse tree and buildinglook-up tables based upon the surrounding environment (e.g., localvariables of a method in which the short name is located). This typeinformation is necessary to enable the short name to be tested todetermine whether the short name is, in fact, a local variable oranother class or method from outside the local context (i.e., adependency on another class). Steps 607 and 608 are similar to thepreviously described compilation process used to generate the binaryfiles, except in this situation only the substeps of parsing the sourcefiles and type attribution are used. The substep of code generation isunnecessary. At the completion of the type attribution process, theparse tree representations of the candidate source files include shortnames (or symbols), type names, and position information.

At step 609, the annotated parse trees are traversed to locate nodesthat match the short name(s) and type(s) of interest. The position andline number of the matching nodes can then be obtained from therepository. The position information (i.e., the line number in thesource file in which the matching node is located) is used to locatethis code in the appropriate source file. At step 610, the modificationsare applied to to the underlying source files (i.e., modifications tothe textual form of the source code). After the modifications areapplied to the underlying source files, the refactoring is complete. Theuser can then save the changes made during the refactoring at step 611.For most types of refactoring, the currently preferred embodimentenables the changes made during the refactoring to be viewed by the userbefore the changes are saved. After the changes have been saved, theuser may, if desired, recompile the source code to verify successfulrefactoring and update the binary files. The user may also proceed tomake additional modifications to the application, which may includerepeating the above steps for refactoring another element of theapplication.

While the invention is described in some detail with specific referenceto a single-preferred embodiment and certain alternatives, there is nointent to limit the invention to that particular embodiment or thosespecific alternatives. For instance, the foregoing discussion uses arefactoring of a software application written in the Java language toillustrate the operations of the present invention. However, Java isonly one of the possible programming languages with which the presentinvention may be advantageously utilized. Accordingly, the references torefactoring a Java application in the foregoing discussion are forpurposes of illustration and not limitation. Those skilled in the artwill appreciate that modifications may be made to the preferredembodiment without departing from the teachings of the presentinvention.

What is claimed is:
 1. A method for locating dependencies to source codechanges to a software system, the method comprising: receiving a sourcecode change to the software system; parsing binary modules of thesoftware system to determine which binary modules contain dependenciesto said source code change; retrieving corresponding source files ofsaid binary modules; and using a compiler to identify each dependencypresent in the retrieved source code.
 2. The method of claim 1, furthercomprising: modifying each dependency based upon said source codechange.
 3. The method of claim 1, wherein said binary modules comprisecompiled Java bytecode.
 4. The method of claim 1, wherein said binarymodules comprise Java .class files.
 5. The method of claim 1, whereinsaid parsing step includes placing information about each binary modulein a repository.
 6. The method of claim 5, wherein the informationplaced in the repository includes a particular binary module containingthe source code change and other binary modules which reference thisparticular binary module.
 7. The method of claim 1, wherein said step ofusing a compiler to identify each dependency includes the substeps of:parsing the source files to create parse tree representations of thesource files; and applying type attribution to annotate nodes of theparse tree representations with type information.
 8. The method of claim7, further comprising: annotating the parse tree representations withposition information for each dependency in the source files.
 9. Themethod of claim 7, further comprising: traversing the parse treerepresentations to identify each dependency.
 10. The method of claim 1,further comprising: providing short name, type information, and positioninformation for each dependency.
 11. The method of claim 1, furthercomprising: providing a mapping to each dependency to enable a user tonavigate to each dependency.
 12. The method of claim 1, wherein saidstep of using a compiler to locate each dependency includes using acompiler used for development of the software system.
 13. The method ofclaim 1, wherein said step of using a compiler to locate each dependencyincludes using a Java compiler.
 14. The method of claim 1, wherein thesource code change includes a selected one of package name, class name,interface, method, field, variable, and property.
 15. Acomputer-readable medium having computer-executable instructions forperforming the method of claim
 1. 16. A downloadable set ofcomputer-executable instructions for performing the method of claim 1.17. An improved method for compiler-assisted refactoring of a softwareapplication, the method comprising: receiving a request for refactoringof a software application, said refactoring comprising a change to agiven symbol of the software application; parsing binary files of thesoftware application to identify the binary files containing referencesto the given symbol; retrieving source files of the identified binaryfiles; using a compiler to generate a list of uses of the given symbolin the software application, said list including name, type information,and location of the given symbol; and applying changes to the softwareapplication based on said list of uses.
 18. The method of claim 17,further comprising: displaying the changes applied to the softwareapplication; and in response to a user command, saving the appliedchanges.
 19. The method of claim 17, wherein said parsing step includesgenerating a list of all binary files containing references to the givensymbol.
 20. The method of claim 17, wherein the given symbol includes aselected one of package name, class name, interface, method, field,variable, and property.
 21. The method of claim 17, wherein said parsingstep includes the substeps of: determining a particular binary moduledefining the given symbol; and determining all other binary modulesreferencing said particular binary module.
 22. The method of claim 17,wherein said step of using a compiler to generate a list of usesincludes the substeps of: using the compiler to parse the source filesof the identified binary files to locate all references to the givensymbol; and using the compiler for type attribution of the locatedreferences.
 23. The method of claim 17, wherein said list of usesincludes an annotated parse tree containing text name, type information,and position information.
 24. The method of claim 17, furthercomprising: providing a mapping to the location of each given symbol ineach source file facilitating access to each given symbol.
 25. Themethod of claim 17, wherein said compiler used for the compiler-assistedrefactoring method is a compiler used to develop the softwareapplication.
 26. A computer-readable medium having computer-executableinstructions for performing the method of claim
 17. 27. A downloadableset of computer-executable instructions for performing the method ofclaim
 17. 28. A method for assisting a user with changing source codefor a software program, the method comprising: receiving a user-providedchange to the source code for said software program; parsing binarymodules of the software program to determine which particular binarymodules are dependent on said user-provided change; retrievingcorresponding source files of the binary modules for those particularbinary modules determined to be dependent on said user-provided change;and with a compiler, identifying each particular portion of the sourcecode in the retrieved source code files that is affected by saiduser-provided change.
 29. The method of claim 28, further comprising:automatically applying changes to said each particular portion of thesource code affected by said user provided change.
 30. The method ofclaim 28, wherein said binary modules comprise compiled Java bytecode.31. The method of claim 28, wherein said parsing step includes placinginformation about each binary module in a repository.
 32. The method ofclaim 31, wherein the information placed in the repository includes aparticular binary module containing said user-provided change and otherbinary modules which reference this particular binary module.
 33. Themethod of claim 31, wherein the information in the repository is used toretrieve source files dependent on the binary module containing saiduser-provided change.
 34. The method of claim 28, wherein said step ofidentifying each particular portion of source code includes the substepsof: parsing the source files to create parse tree representations of thesource files; applying type attribution to annotate nodes of the parsetree representations with type information; and traversing the parsetree representations to identify each particular portion affected bysaid user-provided change.
 35. The method of claim 28, furthercomprising: providing short name, type information, and positioninformation for each particular portion of the source code affected bysaid user-provided change.
 36. The method of claim 28, furthercomprising: providing a mapping to each particular portion of the sourcecode affected by said user provided change.
 37. A computer-readablemedium having computer-executable instructions for performing the methodof claim
 28. 38. A downloadable set of computer-executable instructionsfor performing the method of claim
 28. 39. A system for facilitating amodification to source code of a software program, the systemcomprising: a compiler for compiling source code into binary modulesthat comprise said software program; a refactoring module for: parsingthe binary modules of said software program to determine whichparticular binary modules are dependent on said modification to sourcecode; retrieving the source code of the particular binary modulesdependent on said modification; and invoking the compiler foridentifying particular portions of the retrieved source code affected bysaid modification.
 40. The system of claim 39, further comprising:applying said modification to said particular portions of the retrievedsource code.
 41. The system of claim 39, wherein said binary modulescomprise compiled Java bytecode.
 42. The system of claim 39, whereinsaid parsing step includes placing information about each binary modulein a repository.
 43. The system of claim 42, wherein the information inthe repository includes a particular binary module containing saidmodification and other binary modules which reference this particularbinary module.
 44. The system of claim 39, wherein the information inthe repository is used to retrieve source files dependent on the binarymodule containing said modification.
 45. The system of claim 39, whereininvoking the compiler includes using the compiler for type attributionof said particular portions of the retrieved source code.
 46. The systemof claim 39, wherein the compiler, when invoked by the refactoringmodule, performs the steps of: creating parse tree representations ofthe retrieved source code; annotating the parse tree representationswith type information; and traversing the annotated parse treerepresentations to identify each particular portion of the retrievedsource code affected by said modification.
 47. The system of claim 39,further comprising: providing a mapping to each identified particularportion of the source code affected by said modification.